AITopics

Country: North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Neural Information Processing SystemsFeb-12-2026, 02:06:23 GMT

48237d9f2dea8c74c2a72126cf63d933-AuthorFeedback.pdf

Ouranalysishelps13 illustrate why this worst-case behavior does not attain in practice. The standard union bound does not take advantage of model similarity; the refinement in equation (3) is the24 basis for our calculations that highlight the beneficial effect of model similarity. Our bounds demonstrate a link between similarity and protection against overfitting in both the30 non-adaptiveandadaptivecase. We agree an important direction for future work is exploring the extent to which our findings transfer to other44 settings. A concrete next step is evaluating model similarity on data from Kaggle competitions, which includes45 adiversesetofsample sizes, model classes, anddata types.

artificial intelligence, machine learning, model similarity, (1 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.40)

arXiv.org Artificial IntelligenceOct-30-2025

Jailbreak Transferability Emerges from Shared Representations

Angell, Rico, Brinkmann, Jannik, He, He

Jailbreak transferability is the surprising phenomenon when an adversarial attack compromising one model also elicits harmful responses from other models. Despite widespread demonstrations, there is little consensus on why transfer is possible: is it a quirk of safety training, an artifact of model families, or a more fundamental property of representation learning? We present evidence that transferability emerges from shared representations rather than incidental flaws. Across 20 open-weight models and 33 jailbreak attacks, we find two factors that systematically shape transfer: (1) representational similarity under benign prompts, and (2) the strength of the jailbreak on the source model. To move beyond correlation, we show that deliberately increasing similarity through benign only distillation causally increases transfer. Our qualitative analyses reveal systematic transferability patterns across different types of jailbreaks. For example, persona-style jailbreaks transfer far more often than cipher-based prompts, consistent with the idea that natural-language attacks exploit models' shared representation space, whereas cipher-based attacks rely on idiosyncratic quirks that do not generalize. Together, these results reframe jailbreak transfer as a consequence of representation alignment rather than a fragile byproduct of safety training.

large language model, machine learning, natural language, (20 more...)

2506.12913

Country: Asia > Middle East > UAE (0.28)

Genre: Research Report > New Finding (0.93)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military (0.88)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Neural Information Processing SystemsOct-2-2025, 16:07:06 GMT

Model Similarity Mitigates Test Set Overuse

Horia Mania, John Miller, Ludwig Schmidt, Moritz Hardt, Benjamin Recht

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, similarity, (17 more...)

Country: North America (0.28)

Genre: Research Report (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Neural Information Processing SystemsOct-2-2025, 16:06:51 GMT

Reviewer

We would appreciate any pointers to additional related work in the updated review.

artificial intelligence, model similarity, natural language, (15 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language (0.31)

Zhang, Guanhua, Dorner, Florian E., Hardt, Moritz

How Benchmark Prediction from Fewer Data Misses the Mark

arXiv.org Artificial IntelligenceJun-10-2025

Large language model (LLM) evaluation is increasingly costly, prompting interest in methods that speed up evaluation by shrinking benchmark datasets. Benchmark prediction (also called efficient LLM evaluation) aims to select a small subset of evaluation points and predict overall benchmark performance from that subset. In this paper, we systematically assess the strengths and limitations of 11 benchmark prediction methods across 19 diverse benchmarks. First, we identify a highly competitive baseline: Take a random sample and fit a regression model on the sample to predict missing entries. Outperforming most existing methods, this baseline challenges the assumption that careful subset selection is necessary for benchmark prediction. Second, we discover that all existing methods crucially depend on model similarity. They work best when interpolating scores among similar models. The effectiveness of benchmark prediction sharply declines when new models have higher accuracy than previously seen models. In this setting of extrapolation, none of the previous methods consistently beat a simple average over random samples. To improve over the sample average, we introduce a new method inspired by augmented inverse propensity weighting. This method consistently outperforms the random sample average even for extrapolation. However, its performance still relies on model similarity and the gains are modest in general. This shows that benchmark prediction fails just when it is most needed: at the evaluation frontier, where the goal is to evaluate new models of unknown capabilities.

large language model, machine learning, natural language, (18 more...)

2506.07673

Country: Europe (0.67)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)

Goel, Shashwat, Struber, Joschka, Auzina, Ilze Amanda, Chandra, Karuna K, Kumaraguru, Ponnurangam, Kiela, Douwe, Prabhu, Ameya, Bethge, Matthias, Geiping, Jonas

Great Models Think Alike and this Undermines AI Oversight

arXiv.org Artificial IntelligenceFeb-6-2025

As Language Model (LM) capabilities advance, evaluating and supervising them at scale is getting harder for humans. There is hope that other language models can automate both these tasks, which we refer to as "AI Oversight". We study how model similarity affects both aspects of AI oversight by proposing a probabilistic metric for LM similarity based on overlap in model mistakes. Using this metric, we first show that LLM-as-a-judge scores favor models similar to the judge, generalizing recent self-preference results. Then, we study training on LM annotations, and find complementary knowledge between the weak supervisor and strong student model plays a crucial role in gains from "weak-to-strong generalization". As model capabilities increase, it becomes harder to find their mistakes, and we might defer more to AI oversight. However, we observe a concerning trend -- model mistakes are becoming more similar with increasing capabilities, pointing to risks from correlated failures. Our work underscores the importance of reporting and correcting for model similarity, especially in the emerging paradigm of AI oversight.

large language model, machine learning, qwen2, (21 more...)

2502.04313

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
Asia > Afghanistan > Parwan Province > Charikar (0.04)
North America > United States > New York (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Education (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.78)

arXiv.org Artificial IntelligenceMar-29-2024

FedAC: An Adaptive Clustered Federated Learning Framework for Heterogeneous Data

Zhang, Yuxin, Chen, Haoyu, Lin, Zheng, Chen, Zhe, Zhao, Jin

Clustered federated learning (CFL) is proposed to mitigate the performance deterioration stemming from data heterogeneity in federated learning (FL) by grouping similar clients for cluster-wise model training. However, current CFL methods struggle due to inadequate integration of global and intra-cluster knowledge and the absence of an efficient online model similarity metric, while treating the cluster count as a fixed hyperparameter limits flexibility and robustness. In this paper, we propose an adaptive CFL framework, named FedAC, which (1) efficiently integrates global knowledge into intra-cluster learning by decoupling neural networks and utilizing distinct aggregation methods for each submodule, significantly enhancing performance; (2) includes a costeffective online model similarity metric based on dimensionality reduction; (3) incorporates a cluster number fine-tuning module for improved adaptability and scalability in complex, heterogeneous environments. Extensive experiments show that FedAC achieves superior empirical performance, increasing the test accuracy by around 1.82% and 12.67% on CIFAR-10 and CIFAR-100 datasets, respectively, under different non-IID settings compared to SOTA methods.

fedac, knowledge, learning, (12 more...)

2403.1646

Country:

Asia > China > Hong Kong (0.04)
North America > Canada > Ontario > Toronto (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (1.00)

Industry: Information Technology (0.94)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

arXiv.org Machine LearningMay-29-2019

Model Similarity Mitigates Test Set Overuse

Mania, Horia, Miller, John, Schmidt, Ludwig, Hardt, Moritz, Recht, Benjamin

Excessive reuse of test data has become commonplace in today's machine learning workflows. Popular benchmarks, competitions, industrial scale tuning, among other applications, all involve test data reuse beyond guidance by statistical confidence bounds. Nonetheless, recent replication studies give evidence that popular benchmarks continue to support progress despite years of extensive reuse. We proffer a new explanation for the apparent longevity of test data: Many proposed models are similar in their predictions and we prove that this similarity mitigates overfitting. Specifically, we show empirically that models proposed for the ImageNet ILSVRC benchmark agree in their predictions well beyond what we can conclude from their accuracy levels alone. Likewise, models created by large scale hyperparameter search enjoy high levels of similarity. Motivated by these empirical observations, we give a non-asymptotic generalization bound that takes similarity into account, leading to meaningful confidence bounds in practical settings.

artificial intelligence, machine learning, similarity, (18 more...)

arXiv.org Machine Learning

1905.1258

Country: North America > United States (0.46)

Genre: Research Report (0.85)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)